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Abstract. We present a study of the effect of High Column Density (HCD) systems on 
the Lya forest correlation function on large scales. We study the effect both numerically, by 
inserting HCD systems on mock spectra for a specific model, and analytically, in the context 
of two-point correlations and linear theory. We show that the presence of HCDs substantially 
contributes to the noise of the correlation function measurement, and systematically alters 
the measured redshift-space correlation function of the Lya forest, increasing the value of 
the density bias factor and decreasing the redshift distortion parameter (3 a of the Lya forest. 
We provide simple formulae for corrections on these derived parameters, as a function of the 
mean effective optical depth and bias factor of the host halos of the HCDs, and discuss the 
conditions under which these expressions should be valid. In practice, precise corrections to 
the measured parameters of the Lya forest correlation for the HCD effects are more complex 
than the simple analytical approximations we present, owing to non-linear effects of the 
damped wings of the HCD systems and the presence of three-point terms. However, we 
conclude that an accurate correction for these HCD effects can be obtained numerically and 
calibrated with observations of the HCD-Lya cross-correlation. We also discuss an analogous 
formalism to treat and correct for the contaminating effect of metal lines overlapping the Lya 
forest spectra. 
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1 Introduction 

Observations of the correlation function of the Lya forest in redshift space from multiple 
spectra is emerging as a powerful tool to explore the large-scale structure of the universe 
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at high redshift. This development has been led by the BOSS survey, part of the SDSS-III 
collaboration [1], which is obtaining optical spectra of 160,000 quasars at z > 2.1 for the 
principal purpose of studying the Lya forest absorption and measuring its power spectrum. 
The redshift space power spectrum of the fluctuations in the fraction of transmitted flux, 
F, has a complex form on small scales that is affected by non-linear gravitational evolution, 
thermal broadening, the non-linear relation between F and the optical depth, and complex 
physical processes such as galactic winds. But on large scales, the power spectrum should 
be simply related to the mass power spectrum in the linear regime, Pl, through two biasing 
parameters [2]: 



where k and jik are the modulus and angle cosine relative to the line of sight of the wave 
vector in redshift space, b a is the bias factor relating the amplitude of fluctuations in F to 
the relative amplitude of density fluctuations, and (3 a is the redshift distortion parameter. 
This form of the linear power spectrum in redshift space is the same as that for discrete 
tracers of the density field [3], except that f3 a depends also on the bias parameter for the 
peculiar velocity gradient, b„. Recently, the first measurement of b a and f3 a for the Lya forest 
was reported by [4] from the first year of BOSS data, and more accurate measurements are 
expected in the near future. 

The values of b a and (3 a as a function of redshift can be predicted in principle from 
numerical simulations of the Lya forest [2, 5], and they depend on the detailed small-scale 
physical processes in the intergalactic medium. Comparison of the predicted values with the 
observed ones will therefore test these physical processes. However, in practice the observed 
absorption spectra are affected not only by the low-density gas producing the Lya forest, 
but also by higher density systems that give rise to absorption lines of high column density, 
observed as Lyman limit systems (hereafter LLS, with column densities Nhi > 10 17 ' 2 cm -2 ) 
and damped Lya systems ( hereafter DLA, with Nhi > 10 20 ' 3 cm~ 2 ). These systems, as well 
as the lower column density Lya forest, produce also metal absorption lines, some of which 
appear in the region of the Lya absorption and add to the contamination of the measurement 
of the Lya power spectrum. 

The presence of high column density systems (hereafter referred to as HCDs, meaning 
both LLS and DLAs) has a similar effect on the Lya power spectrum as the well-known 
"fingers of God" in galaxy redshift surveys: on small, non-linear scales, galaxies accumulate 
in high-density clusters with an internal velocity dispersion, appearing in redshift space as 
highly elongated structures along the line of sight. This induces contours of the correlation 
function that are also elongated along the line of sight on small scales, precisely the opposite 
to the squashing effect on the correlation function contours induced by the Kaiser linear term 
in the power spectrum that is prevalent on large scales. In the case of absorption spectra, 
the damped wings of the HCDs may similarly spread the correlation function along the line 
of sight. However, contrary to the "fingers of God" in galaxy surveys, the effects of damped 
wings extend out to all large scales in the Lya forest, owing to their power-law absorption 
profiles. Metal lines can also cause an elongation of contours when they overlap the Lya 
forest and introduce bumps in the correlation function near the line of sight around the 
separation that corresponds to the wavelength difference between the metal and the Lya 
lines. In addition to this effect, there is also the purely linear fact that if the HCDs have 
a different redshift distortion factor than the Lya forest, the correlation of the combined 
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transmission will display an averaged redshift distortion factor of the absorber populations 
that are contributing to the total absorption. 

While most previous work studied the effect of HCDs on the power spectrum along the 
line-of-sight [6, 7], this paper focuses on the impact of HCDs on the linear bias factors of 
the Lya forest. Their effect on the measured power spectrum is determined by the fact that 
HCDs are correlated with the underlying mass distribution and therefore with the Lya forest 
intergalactic absorption. Their presence also adds additional noise to any power spectrum 
measurements. The impact of metal-line absorbers is also important and was briefly discussed 
in [4]. Here we present a description of the expected effect, and we describe a method to 
correct it in the Lya correlation measurements. 

In Section 2 we present a method to introduce HCD systems in Lya mock spectra. 
The effect of HCD in the measurement of the Lya forest correlation inferred from mocks 
is presented in Section 3. An analytical description of this effect is described in Section 4. 
Finally, the impact of metal lines is discussed in Section 5. 

A standard flat ACDM cosmology is used in this paper with the following parameters: 
h = 0.72 , n m = 0.281, a s = 0.85, n s = 0.963, n b = 0.0462. 

2 Model for the High Column Density systems 

The impact of HCDs on the correlation function of Lya absorption depends on their column 
density and Doppler parameter distribution, and on the way they are distributed in space 
relative to the underlying Lya forest. In this section, we describe the method we use to 
introduce these systems in mock Lya absorption spectra. We first briefly summarize the 
method to generate the Lya forest mock spectra [8]. Then we describe our model distributions 
for HCDs, and the way they are inserted in the mock spectra with a correlation with the 
Lya absorption field. 

2.1 Lya mock spectra 

The reader is referred to [8] for a full account of the method we use to generate mock 
Lya spectra with any specified three-dimensional flux power spectrum and flux probability 
distribution function. Here we highlight the features that are most important for this paper. 
The method consists of two steps: 

• A Gaussian random field 6 g (x) is generated for the set of specified lines of sight. 

• The field is transformed to a new variable F(5 g ) constrained to the range < F < 1, 
determined by the condition of matching the model probability distribution of F. The 
power spectrum for the Gaussian variable 5 g is chosen so that the final flux power 
spectrum of F is the desired one. 

In general, as described in [8], a third step can be applied where one interpolates the 
value of F between lines of sight generated at different redshifts to simulate the effect of 
redshift evolution and the fact that the lines of sight are not parallel. This third step is not 
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included in this paper. The lines of sight are generated as parallel lines at a fixed redshift 
of z = 2.3, with a value of the mean transmission fixed at F a = 0.791, in order to study the 
effect of the HCD systems without introducing additional complexities into the mocks. 

We use here the same flux power spectrum P a (k,fik) from [2], 

P a (k, t i k ) = bl(l + f3 at 4) 2 P L (k)D(k,» k ) , (2.1) 

where -Pl(&) is the linear matter power spectrum, fj,f. is the cosine of the angle of the Fourier 
vector from the line of sight, and D(k,/ik) is a small scale non-linear term that was fitted to 
the results of numerical simulations in [2] . We use the central values for the model parameters 
from the first row of Table 1 in [2] , after applying a small correction to the bias parameter for 
the difference in redshift (the biases in [2] are computed at z = 2.25) by assuming that the 
power spectrum amplitude evolves as (1 + z) 3,8 [4], which implies, neglecting the influence of 
the cosmological constant on the growth factor at this redshift, that the bias b a evolves as 
(1 + z) 2,9 . The resulting bias parameters are b a = —0.1375 and f} a = 1.58. 



2.2 Column density distribution and Doppler parameters 



A large part of the contamination by HCD systems on the Lya forest correlation arises from 
the damped wings, which depend exclusively on the column density. It is therefore most 
important to use a model that reproduces the observed column density distribution of these 
systems. The large number of quasars observed in the Sloan Digital Sky Survey has in recent 
years allowed good determinations of this distribution [9, 10]. 




Figure 1. Left: Number of HCD systems per unit of column density Njji and unit of "absorption 
distance" as defined in the text. The lines show the values for our model, at z = 2.2, z = 2.6, 
z = 3 and z = 3.4. The vertical line indicates the standard separation between DLA and LLS. Points 
with errorbars show the observational determination in [10], with a central redshift of z ~ 3. Right 
Number of systems per unit of redshift in the indicated column density ranges as a function of redshift. 

Here we use the neutral hydrogen column density distribution used in [6] , which is based 
on an analytical expression derived in [11] that assumes an intrinsic power daw distribution 
of the total hydrogen column density and takes into account the self-shielding effects on the 
neutral column density, and is calibrated to match the observations of DLAs in [9] . In figure 
1, the column density distribution in this model is shown at different redshifts (left panel), 
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together with the observations of [10]. In this figure we plot the frequency distribution per 
unit of column density and "absorption distance" X, defined as 

dX = ^-(l + zfdz, (2.2) 

since this is the function presented in the SDSS analysis. Self-shielding causes the flattening 
of the distribution in the column density range of 10 18 to 10 20 cm -2 . In the right panel we 
plot the number of systems as a function of redshift integrated over various column density 
ranges. 
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Figure 2. Mean absorption caused by HCD systems for several values of bo- Lower lines include 
systems with Nhi < 10 20 3 cm~ 2 only; upper lines include all systems. 

As discussed in section 4, the perturbation caused by HCD systems on the Lya correla- 
tion increases with their contribution to the mean effective optical depth, f e n = — log(F^), 
which depends on the velocity dispersions in addition to the column densities of the systems. 
Here we calculate its value as a function of redshift for different values of the Doppler pa- 
rameter bp, and the column density distribution used in this study. Defining W (Nhi, bp) 
as the rest-frame equivalent width according to the standard curve of growth of an absorber 
with a Gaussian distribution of velocities, the mean absorbed fraction by HCDs (assuming 
their positions are uncorrelated) is 

" (\ ( rlAT #< Z ' N Hl) W(N H I,b D ) 

r eH (z) = J dN H i x - (1 + z). (2.3) 

This effective optical depth is plotted in Figure 2 as a function of redshift, separately 
for all the systems (HCD) and including only systems with Nhi < 10 20 ' 3 cm -2 (designated 
here as LLS), for three different values of bo- The contribution from systems that are not 
included in the definition of DLAs to f e n is about half of the total, and increases with bp- 
At the redshift of our mocks, the total effective optical depth from HCDs is close to 2%. 

In the mocks of this paper we use a value of bo = 70kms _1 , a representative value for 
DLAs (see [12]). We note that the value of bp actually has a large dispersion and its mean 
depends on the column density (being smaller for lower column density systems). We shall 
not examine the possible dependence of the effects on the correlation function we study on 
the distribution of bp, but we note that most of these effects arise from the damped wings, 
which are unaffected by the velocity dispersion. Better observational constraints on f e n in 
the future will help quantify the effect of HCDs more accurately. 
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2.3 Clustering of HCDs 



While the effect of HCDs on increasing the noise in the measurement of the Lya correlation 
can be adequately estimated by simply placing the absorption systems randomly in the mock 
spectra, the systematic effect on the correlation is induced only by their clustering. On large 
scales, this systematic effect on the total measured correlation should be governed by the 
bias factors of the HCDs. 

Here we use a simple method to insert these systems in the mock spectra, by placing 
them only in a fraction v of pixels where the optical depth is above a certain threshold, 
r > r c . For a fixed distribution function of F = exp(— r), the value of v determines the 
critical optical depth r c or, equivalently, a critical transmitted flux fraction F c : 



dr p T (r) 



dF p F {F) 



(2.4) 



The dependence of the probability distribution function p T on redshift implies that the thresh- 
old t c for hosting a HCD depends also on redshift. In this paper we do not include redshift 
evolution to avoid complications, and we analyze the effect of HCDs at the single redshift 
z = 2.3, although our method generally incorporates redshift evolution when detailed mocks 
of the BOSS data are required. Here we generally use v = 0.01 (which yields r c ~ 12 at 
z = 2.3), except for one model where we increase the HCD bias by changing to v = 0.002. 
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Figure 3. Left: Example of mock Lya absorption field F (red) and the threshold F c to host a 
HCD for v = 0.01 (dotted blue). Right: Mock spectrum for the same line of sight. The green line 
is the quasar continuum, and the blue line is the absorption due to Lya forest, smoothed with the 
spectrograph resolution as described in ([8]). The red line includes absorption by a HCD. 

The HCD are randomly inserted in the mock spectra in pixels with r > r c , assigning 
column densities to them that follow the distribution shown in Figure 1. A typical mock 
spectrum is shown in Figure 3. The left panel shows the spectrum of F due to the Lya 
forest and value of F c . Systems can only be located in a few narrow spikes in optical depth 
covering a fraction v of the spectrum. The right panel shows the spectrum in linear units 
after multiplying by the quasar continuum and smoothing with the resolution of the BOSS 
spectrograph, as described in [8]. A HCD has been randomly assigned to one of the peaks 
that cross the threshold in the first figure (in this case, the peak at A ~ 4260 A), and has 
been included in the total absorption. 
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As shown in Appendix A, the large-scale clustering of the HCDs inserted in this way 
in the mock spectra follows linear theory with a bias factor bh that depends on v and on 
the probability distribution function of F (which is modelled as a lognormal function in our 
mocks, as described in [8]), and with a redshift distortion parameter equal to that of the 
Lya forest, /3/j = j3 a . We note that this is a consequence of the simple procedure we use for 
inserting the HCDs, and that in reality their bias factor and redshift distortion parameter 
should depend on the distribution of their host halos and the selection effects involved in 
their detection. For our fiducial value of u = 0.01, the bias of the HCDs inserted in our 
mocks is bh = 1.21, while for a more extreme value of v = 0.002 the bias is bh = 1.43. 



3 Effect on the measured Lya correlation function 

We now investigate the impact of HCDs on the Lya correlation function, by measuring it 
directly in mock spectra. We generate 100 realizations of a mock survey with an area of 
200 deg 2 . Quasars are distributed following the luminosity function from [13], with a total 
quasar density of 22 deg -2 , over the redshift range 2.15 < z < 3.5. We use the same definition 
of the Lya forest as [4], i.e., the rest-frame wavelength range 1041 A - 1185 A. We also apply 
a cut at an observed wavelength of 3600 A, close to the end of the BOSS spectrograph. 

As previously mentioned, the Lya absorption mock spectra are generated with no red- 
shift evolution and assuming that the lines of sight are parallel. HCD systems are inserted 
with the method explained above, and we measure the correlation function in 150 linear bins 
in r of width 1 h^ 1 Mpc, and 20 linear bins in fi = cos 6. The correlation function in each bin 
A is estimated by averaging over all pixel pairs with a separation that is within the bin A, 

? 52ijeAtFiSFj 

U = : , (3.1) 

where the indices i or j label all pixels in the Lya forest region. Here the weights are all set 
equal to unity because the mock spectra are noiseless. 

In Figure 4 we show the mean measurement of the correlation function in the iV = 100 
realizations, before and after adding the HCD systems, after averaging the original bins into 
wider ones with Ar = 5 /i -1 Mpc and Afi = 0.2. The errorbars are computed for the mean, 
equal to the dispersion among realizations divided by y/N — 1. 

There are two main effects caused by the HCD systems on the measured correlation 
function: the measurement error is increased, and the correlation function is systematically 
altered from the true value of the Lya forest alone. We first quantify the increase in the 
statistical error in §3.1, and then we study the systematic effects on the two Lya bias param- 
eters in §3.2. We shall not examine in this paper the systematic effect on the inferred shape 
of the linear power spectrum in equation (2.1). However, as seen in Figure 4, the systematic 
change in the correlation function is not limited to the values of the bias parameters but 
can affect the broadband shape in a generic way, even at very large scales because of the 
extended damped wings of HCDs. Therefore, any other constraints that are obtained from 
measurements of the Lya forest power spectrum are in general subject to a correction for 
the impact of HCDs. 
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Effect of HCD on the correlation function 




20 40 60 80 100 120 140 

r (h" 1 Mpc) 



Figure 4. Correlation function measured from mock spectra with inserted high column density 
systems, shown in 3 angular bins, with errorbars indicating the error for the mean of 100 mocks with 
200 square degrees each. Thin lines show the mean measured in the same mock spectra without the 
HCD systems. 

3.1 Increase of measurement errors 

In Figure 5, the increase in the error of the measured correlation function due to HCD systems 
is shown. The error is for the mean of all the N = 100 realizations. 

The increase of the errorbars in the absence of observational noise is ~ 30 — 50%. 
Because errors are added quadratically, this means that the contribution from HCD to the 
total noise is comparable to that arising from the intrinsic small-scale variance of the Lya 
forest. In an actual survey like BOSS, the error budget of the correlation function includes 
also observational noise. For instance, it was shown in [8] that a level of noise comparable to 
that in the BOSS survey increases the errorbars by ~ 30%. In this case, the total error budget 
has three comparable contributions from the intrinsic Lya forest variance, HCD systems and 
observational noise. 

We have performed some tests on the origin of the additional errors introduced by HCDs 
in the correlation measurement. If DLAs (with Njji > 10 20 ' 3 cm -2 are eliminated from the 
HCDs that are inserted (a model we designate as NO DLA, see Table 1 below), then the 
increase of errors is reduced to 15%. If the damped wings of all the HCDs are eliminated 
(keeping only their saturated Gaussian profiles, a model we designate as NO WINGS), then 
the error increase is further reduced to 5 to 10% of the total. This shows that the damped 
wings are the dominant reason for increased errors in the correlation measurements. 
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Figure 5. Errors in the correlation function measured from mock spectra with (thick lines) and 
without (thin lines) high column density systems, in 3 angular bins. The errors have been multiplied 
by r. 

3.2 Systematic effect on the measured bias parameters 

We now discuss the result of fitting the correlation function in the mocks with inserted HCD 
systems with the equation for the linear theory power spectrum, 

P F (k,fi k ) = b 2 F (l + fJ F fil) 2 P L (k) , (3.2) 

where now b F and (5 F are the bias parameters for the total absorption field, including Lya 
forest and HCD absorption. 

The precise procedure for fitting the correlation function by x 2 minimization carried out 
in [4] requires a complex and expensive calculation of the covariance matrix of the correlation 
values measured in each pair of bins. Here we use a simplified procedure, where only the 
diagonal elements of the covariance matrix of the binned correlation function are taken into 
account to minimize x 2 - This allows us to quickly examine a large number of realizations of 
many different models. However, in order to obtain the errors in the fitted bias parameters, 
we rely on bootstrap combinations of the 100 realizations of the survey. In other words, we 
simply use the dispersion in the fitted values of the parameters that are obtained in different 
realizations. Using the full covariance matrix should lead to reduced errors of the fitted 
parameters, but we find that this error reduction is not large and that the systematic impact 
of HCDs is adequately reflected in the results of the fits that use the diagonal elements of 
the covariance matrix only. 

The results of the fits for the bias parameters are listed in Table 1 for a variety of models 
with different properties of the HCDs. The mean effective optical depth and the bias of the 
selected pixels for inserting HCDs are listed initially in Table 1. The additional variable C 
given in the Table is the correlation of the Lyce and HCD transmission fluctuation at zero 
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separation [see eq. (4.2) in the next section, where this will be used; we note here that the 
mean transmitted fraction F is slightly reduced by the insertion of HCDs according to eq. 
(4.3), where Fjj = exp(— f e u)]. As shown in [4], the combination of parameters that is most 
accurately determined from the 3D correlation function is bp (1 + Pf). We therefore list the 
average best fit values of ftp, (3f and 6p(l+/3i?), each one with the error directly obtained from 
the bootstrap analysis. We note that the linear function used to fit the correlation function 
in equation (3.2) neglects the non-linear term D(k, that is present in the power spectrum 
used to generate the mocks (eq. 2.1). We therefore do not expect to recover exactly the input 
values of the bias factors even when no HCDs are included. To minimize the non- linear 
effects, our fit to the correlation function uses only bins at r > lOMpc/h. 



Model 


T~eH 


b h 


C 


b F 








b F (l + M 


NO HCD 









-0.1472 


± 


0.0005 


1.550 


± 


0.009 


-0.3756 


± 


0.0003 


FIDUCIAL 


0.017 


1.21 


0.0034 


-0.1678 


± 


0.0013 


1.374 


± 


0.018 


-0.3984 


± 


0.0008 


HIGH BIAS 


0.016 


1.43 


0.0032 


-0.1732 


± 


0.0009 


1.306 


± 


0.011 


-0.3994 


± 


0.0009 


NO DLA 


0.009 


1.21 


0.0029 


-0.1563 


± 


0.0006 


1.495 


± 


0.011 


-0.3902 


± 


0.0004 


NO WINGS 


0.007 


1.21 


0.0033 


-0.1519 


± 


0.0006 


1.546 


± 


0.010 


-0.3867 


± 


0.0005 



Table 1. Fitted bias parameters for mocks with different models for the inserted HCDs. The variable 
C is defined in the next section. 

The first model in Table 1 (labeled NO HCD) does not include any HCD systems. The 
recovered values of the bias parameters are very close to the input ones, b = —0.1375 and 
/3 = 1.58. The small differences are due to the non-linear term. The second row is for our 
fiducial model, where HCDs are added following the column density distribution described 
in the previous section in regions of high optical depth with a spectral filling factor v = 0.01, 
corresponding to a bias factor bh = 1.21. The HCDs in the fiducial model induce an increase 
of the bias parameter bp of 14%, and a reduction of the redshift distortion parameter [3p of 
11%. 

The third model, labeled HIGH BIAS, forces the HCDs into a smaller fraction of the 
Lya forest spectra, v = 0.002, increasing their bias factor to bh = 1.43. The value of bp now 
increases by 18% and f3p decreases by 16%. The systematic impact of HCDs is therefore 
increased as their bias factor increases. We remind that here we have to keep the value of 
(3h for the HCDs equal to that of the Lya forest, because of the way they are inserted in the 
mock spectra. In reality f3h should be close to 1/bh and, as we shall see below this should 
further enhance the impact of the HCDs on the value of /?#. 

The fourth row gives the result for a model where only systems with a column density 
Nfjj < 10 20 ' 3 cm" 2 are included (labeled NO DLA), using again v = 0.01. In other words, 
the systems generally referred to as damped Lya systems are not included. The lower column 
density systems producing the remaining effect, even though they are not generally identified 
as DLAs, obviously produce weak damped absorption wings as well. These weak systems are 
responsible for an increase of the bias factor of 6% and a decrease of (3p of 3.5%, i.e., ~ 30 to 
40% of the total effect of all the systems in the FIDUCIAL model. In a survey with spectra 
with the resolution and signal-to-noise of BOSS, most of the systems with Nhj > 10 20 ' 3 cm -2 
can be individually identified and removed from the sample in order to test for their impact 
on the correlation function, but most of the systems with lower column densities cannot 
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be reliably identified. Therefore, removing the identified DLAs from the sample will not 
completely eliminate the systematic errors in the Lya correlation induced by HCDs. We 
note here that if one chooses to eliminate all the spectra containing DLAs in a Lya forest 
survey like BOSS, then the measured correlation is also systematically biased in a different 
way because the Lya forest is correlated with the presence of HCDs. 

The final model (fifth row, labeled NO WINGS) inserts all the HCDs only with their 
Gaussian profiles (with the fixed Doppler parameter bo = TOkms" 1 ), with no damped wings. 
In this case the impact on the recovered bias parameters is smaller, especially for ftp which 
is practically not affected. This shows that the main impact of HCDs on the correlation 
function is through the damped wings of the absorbers. 

4 Analytical description 

We now present an analytical formulation to evaluate the effect of the high column density 
systems (HCDs) on the correlation function of the flux transmission fraction, F. Even when 
the analytical results require making certain approximations, they are highly useful to provide 
an interpretation of the numerical results and an understanding of the dependence to be 
expected with any variations of the model for the column density distribution and bias 
parameters of the HCDs. 

We start by introducing some useful notation. The transmitted fraction at a point x in 
a spectrum is F(x) = F [1 + <5^(x)], where F is the mean value of F at a certain redshift. 
We divide this total transmission into a contribution Fjj from HCDs, defined as absorption 
systems with a column density Njjj > 1.6 x 10 17 cm -2 (i.e., a continuum optical depth greater 
than unity at the Lyman limit), and a contribution F a from the Lya forest, defined as all 
the remaining Lya absorption by atomic hydrogen. Hence, -F(x) = Fh(x) F a (x). We ignore 
here the presence of metal lines; these will be considered briefly in §5. We note that the 
precise column density at which this conventional separation between Lya forest and HCDs 
is made does not affect our results. The important point is that the Lya forest absorption 
is dominated by systems with much lower column density than the Lyman limit threshold, 
and the HCDs absorption is dominated by systems with much higher column density than 
this threshold. Therefore, the precise choice for the threshold is not crucial. 

Let the transmitted fraction of the Lya forest be F a (x) = F a [1 + <5 a (x)], and the 
transmitted fraction of the HCDs be -Fff(x) = Fh [1 + 5#(x)]. We then have, 



Being tracers of the same underlying density field, the fields 5 a and 5h are correlated, 



F(x) =F[(1 + <5 F (x)] = F«(x) F H (x) =F a [l + 5 a (x)] F H [1 + 5 H (x)] 



(4.1) 



C= (<5 Q (x)fe(x)) 7^0 , 



(4.2) 



and the relation between F and F a is 



F= (F) = F a F H (l + C) . 



(4.3) 



Hence, the variable 5p(x) can be expressed as 



1 + <5f(x) 



F(x) _ [l + 5 Q (x)] [1 + Mx)] 



(4.4) 



F 1 + C 
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4.1 Impact on the correlation function 



We can now write an expression for the correlation function of Sp at two points xi and X2 
as a function of the separation ri2 = xi — X2: 

1 + fr(ria) = ( [1 + ^f(xi)] [1 + 5 F (x 2 )] ) 

= (1 + C)- 2 ( [1 + ff a (xi)] [1 + fe(xx)] [(1 + 5 a (x a )] [(1 + fe(x 2 )] ) 
= (1 + C)- 2 [1 + 2C + £ a (r 12 ) + 2£off (ria) + 6r(ria) 

+ 26a(ri 2 ) + 2£ur(ri 2 ) + £4(1-12)] , (4.5) 
where we have defined: 



Ca(ri2) = 


(5 Q ,(xi)5 Q ,(x 2 )) , 




Cof-ff(ri 2 ) = 


(5 a (xi)fe(x 2 ) ) 


? 


fe(ri2) = 


(<5tf(xi)5tf(x 2 ) ) 




6a(ri2) = 


(5 a (xi)fe(xi)<5 Q 


(x 2 )) , 


?3H(ri2) = 


((5 a (xi)^(xi)5// 


(x 2 )> , 


£4(1*12) = 


(5 a (xi)fe(xi)<5 Q 


(x 2 )<5h(x 2 



Contribution of each term 




-0.1 - 
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Figure 6. The total correlation function £p and the contribution of the four largest terms in equation 
(4.6) in the numerical mock spectra. 

We can compute the contribution of each of the 6 terms in equation 4.5 for the case of 
the mock spectra discussed in Sections 2 and 3. Figure 6 shows these contributions for our 
FIDUCIAL model, for the largest four terms only. The remaining two terms are substantially 
smaller and are omitted, and some of the terms are shown multiplied by a factor of 5 or 10 
for better visualization. As seen in this figure, the leading correction due to the presence of 
HCDs is the term £ Q #, although the next largest term, £3,^, is not much smaller. 
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4.2 Effective bias parameters 



The full expression for the overall transmitted fraction correlation in equation (4.5) is highly 
complex, depending on a combination of 2, 3 and 4-point functions of the fields. While 
the 3-point and 4-point functions defined in equation (4.6) are difficult to characterize, the 
two-point correlation terms can be calculated in the limit of large scales assuming that linear 
theory applies. Then, each Fourier mode is multiplied by the factor 1 + with being 
the redshift distortion factor of any tracer i [3]. With this in mind, we first aim to solve 
analytically the case when the 3-point and 4-point functions are negligible. Note, however, 
that in the case of our FIDUCIAL model in the numerical mocks, the 3-point term is 
actually larger than and is smaller than by a factor of only ~ 5, as shown in Figure 
6. Furthermore, assuming linear theory is not actually accurate in this case, because the 
damped wings of the absorbers extend the correlation in the radial direction up to scales 
that can be arbitrarily large depending on the shape of the correlation function. The results 
obtained by considering only the linear 2-point terms should therefore be considered as no 
more than an initial guide, to be revised by the effects of the other terms (in particular ^ a 
for the observationally more relevant cases) and the validity of the linear approximation. 

We therefore separate equation 4.5 as 

6K r i2) = 6(ri2) + £34(1-12) , (4.7) 

where the 2-point contribution is 

t 1 \ £«( r i2) + 2g Qg (ri 2 ) + 6/(ri2) (A „x 

6(n 2 ) = , (4.8) 

and all the remaining 3-point and 4-point terms are included in £34, 

2g 3a (ri 2 ) + 2 g 3 g(ri 2 ) + frfria) - C 2 

Note that in the limit of large separation, £4 approaches C 2 and therefore £34 vanishes. 



64(ri2) = ^^>^^^™>J^™>-^ . (4.9) 



Now, the Fourier transforms of the 2-point correlations £ a , and £ a H yield their 
corresponding power spectra, which can be expressed in terms of the bias factors for each 
field. As described in [14] and [2], the general bias parameters of an absorption field arise by 
considering the first order expansion of the average of the mean transmission in a large scale, 
linear region that is conditioned to have density and peculiar velocity gradient perturbations 
5 and ij. Therefore, in analogy to the Lya forest, the bias factors for the HCD absorption 
field Sh are 

and the redshift distortion parameter is 

Pn = , (4.11) 

OH 

where f(£l) is the logarithmic derivative of the growth factor (see [3]). In linear theory, each 
Fourier mode of any tracer field is multiplied by the factor bi(l + f^ifj. 2 ,) relative to the mass 
field, and therefore the power spectra are 

P a (k, ilk) = b 2 a (l + Paf4) 2 PL(k) , (4.12) 
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P H (k,Li k ) = b 2 H (l + f3 H fil) 2 P L (k) , (4.13) 

PaH(k,fi k ) = b a b H (l + p af il)(l + p H fi 2 k )P L (k) . (4.14) 

Finally, the power spectrum Pi of the 2-point terms is 

P 2 (k, Mfc) = b 2 a (l + p a ^ k f + 2b a 6 H (l + P af 4)(l + + 6^(1 + Ph\A? 

P L (k) {l + Cf 

= 6|(1 + ^) 2 , (4.15) 



where we have defined 



and 



1 + C 



(4.16) 



b 2 fo = y^Tc ' ( ' 

In the absence of the term £34 and for linear theory, the relation between the bias factors 
measured from the observed correlation and that of the unpolluted Lya forest is therefore 
remarkably simple. In principle, both bu and (3h can be measured from the observable cross- 
correlation of the HCDs with the Lya forest, and then the systematic effect of the HCDs on 
the total correlation can be corrected to obtain the Lya forest bias parameters. 

4.3 Relation to the bias of host halos 

Whereas most of the Lya forest absorption at z > 2 is associated with density fluctuations 
in the intergalactic medium forming an interconnected structure, the high column density 
systems should correspond to discrete, clearly identifiable overdense regions that have grav- 
itationally collapsed, or halos. The natural question that arises is the relation of the bias 
factors defined in equation (4.10) to the host halo bias. We now address this question in 
order to predict the bias factors and the impact of the HCDs on the total power spectrum 
from equation (4.15). 

We define new bias factors for the HCDs, which can be defined in an equivalent way for 
any other population of absorbers, 

, 1 dr eH b H , b vH . . 

where T e jj = — log Fg is the effective optical depth averaged over a large scale region with 
mean values of the density and peculiar velocity gradient perturbations 8 and rj, and f e u 
is its average over all the universe, for 6 = 77 = 0. The effective optical depth is obtained 
by averaging Fh first, and then taking the logarithm. The bias is defined analogously to 
equation (4.10), but using the effective optical depth instead. These new bias factors for 
an absorption field can be interpreted in the usual way that bias factors for a collection of 
objects are interpreted: if the mean density perturbation increases by a fractional amount 
8, while rj is kept fixed, the mean effective optical depth of HCD absorption increases by a 
fractional amount b' H 8. 



- 14 - 



If the bias factor of the HCD host halos is bh, this means that their number density 
should fluctuate on large scales as 5h = bhS. If we now assume that the probability of 
observing an HCD when a halo of fixed mass is intercepted is independent of its large-scale 
environment (i.e., independent of 5 and rj), then the perturbation in the effective optical 
depth contributed by HCDs should be the same as that in the halo number density. In 
other words, we should have b' H = bh, and b' H = 1 in redshift space. More generally, this 
assumption holds only if the following two conditions are met: 

1. The probability that the absorption profile of any HCD appears substantially blended 
with another one in the absorption spectrum is small. Here, substantially blended 
means that their profiles overlap in a region where their absorption optical depth is not 
much smaller than unity. This condition should in general be correct if f e H <C 1 and 
the clustering of HCDs is not very strong. 

2. The probability distribution of the column density in a halo of a fixed mass is 
independent of its large-scale environment and is isotropic (i.e., it is independent of 
8 and rf). In other words, the gas radial profile does not depend on the environment 
for fixed Mh, and the axes of any non-spherical gas distribution in the halos are not 
correlated with the principal axes of the deformation tensor of the surrounding large- 
scale structure. This assumption is likely to be not exactly true, because galaxy disks 
are known to be statistically aligned with the axes of their large-scale environment, 
which can affect their redshift distortion anisotropy [15], but the effect is probably very 
small. 



Under these conditions, the transmission fluctuations due to HCDs should obey Sh(x) = 
—T e H5h{x), an d the bias factors are related by 

bu = -T e HO H = -T e HOh , PH = 7 = — 7 • (4.19) 

bu o h 

In the rest of this Section, we assume that these relations hold, which is reasonable for HCDs 
in view of the conditions that are required, and that f e jj <C 1. Note that these new bias 
factors can be defined for the Lya forest in the same way, and that they also provide a 
measure of the relative fluctuations in the Lya effective optical depth in comparison to the 
relative fluctuations in the mass density, but the Lya forest absorption cannot be associated 
with halos and the two assumptions above are not correct for this case. 

4.4 Corrections for the two-point linear terms 

Using the results above, we can now derive the correction to the bias factors measured from 
the total absorption field that includes the Lya forest and the HCDs, to obtain the corrected 
bias factors of the Lya forest: 

a i , , b a + b H f eH b h + Cb a _ 

Ab = b 2 -b a = i - b a = y^-q ~ T eHb h , (4.20) 

A (6/3) ee b 2 (3 2 - b a ? a = ~ - W(O) , (4.21) 
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Afl _ fl a b a p a -f eH m) f eH \p a - f(n)/b h ] 

1\P = P 2 - Pa = , 7Z —p^ Pa = 7—77 = , (4.22) 

b 2 (l + C) b a /b h -T eH 

where the last expressions in equations (4.20) and (4.21) assume C <C 1 (C is usually also 
substantially smaller than f e n if this effective optical depth from HCDs is dominated by 
damped wings, which have little correlation with the Lya forest). In Table 1 we show the 
value of C for the different mocks. Finally, the correction on the parameter that is best 
constrained from the 3D power spectrum is 

A (6 + V) - + * [bh + ,<„)] . (4.23) 



The simple bias parameter corrections in equations 4.20, 4.21, 4.22 and 4.23 are plotted 
in Figure 7 as a function of f e H, for several values of bh, and for the Lya forest bias values 
used in our numerical mocks, b a = —0.1375 and /3 a = 1.58. 

The effect of the HCDs can be mitigated if a large fraction of them can be individually 
detected in the absorption spectra, when the signal-to-noise is good enough. In a survey with 
similar characteristics as BOSS, one should be able to detect and mask most of the DLAs 
(with Nhi > 10 20 ' 3 cm" 2 ), considerably reducing the value of f e jj- The lower lines in Figure 
2 show 1 — Fjj ~ f e H when only systems of lower column densities are left. The effective 
optical depth from these systems is still about one third of that of all the HCDs. 

However, masking and removing some of the HCDs before the flux correlation is mea- 
sured may introduce potential problems. The HCDs are correlated with the Lya forest, so 
if a portion of the spectrum around each HCD is simply eliminated from the data, we are 
introducing a bias that may be comparable or worse than the one caused by the HCDs them- 
selves. Moreover, the HCDs that are detected may suffer from selection effects induced by 
the superposition of their damped wings with the Lya forest. For example, HCDs living in 
large-scale regions with different values of r] may have different probabilities of being detected 
because the Lya forest absorption around them depends on rj, and this would change the 
derived values of the b v and /3 parameters. One should therefore be particularly careful if 
DLAs are masked and removed to precisely simulate the effect of this procedure in numerical 
mocks and see if the noise they introduce in the measurements can in fact be substantially 
reduced while any systematic effects that their removal may induce are properly corrected 
for. 



4.5 Application to the measurement on mock spectra 

Our simple analytical estimate for the systematic effect of HCDs on the bias parameters can 
now be compared to the numerical results obtained previously from mock spectra in the last 
section. Note that in our numerical mocks, fin = Pa and b' H = b^Pal /(O) 7^ 1, so equations 
(4.20), (4.21), (4.22) and (4.23) cannot be used, or must be modified to include the correct 
b V H- Instead, we can use directly equations (4.16) and (4.17). The parameters for our fiducial 
mocks are b h = 1.21, /3 h = /3 a = 1.58, C = 0.0034, b a = -0.1375 and f eH = 0.017, which 
yield 

Ab = -0.02, A/3 = Ab(l + p) = -0.052 . (4.24) 
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Figure 7. Correction to bias parameters as a function of halo bias bh- Solid, dashed and dotted 
lines correspond to different values of C (0,0.01,0.005). Crosses show the analytical prediction for the 
FIDUCIAL mocks, and the actual measurement from the mocks. 



These corrections from the simple two-point, linear model are shown as crosses in Figure 7. 
Clearly, other effects must be important in changing the best fit bias factors in the mocks, 



in which the value of (3 is substantial decreased by the HCDs. 

There are two reasons for the discrepancies between the analytically predicted correc- 
tions to the bias parameters and the actual corrections found in the mocks. One is the 
presence of the 3-point term the dominant one among the terms in £34 (eq. 4.9) that 
are not taken into account. However, Figure 6 shows that this term contributes only ~ 2% 
to the total correlation function, and is ~ 5 times smaller than the two-point terms, so it 
is surprising that its impact on the fitted values of the bias parameters of the numerically 
obtained correlation function may be so large. 

A clue to what is going on is found in Table 1, where the model NO WINGS (in which 
the damped wings of the HCDs are eliminated) shows practically no correction on (3 p. This 
model has a value of f e n = 0.0068, so we would expect the correction on /3p to be about 
three times smaller than for the FIDUCIAL model under no other changes, but the change 
on Pf is actually 30 times smaller. This strongly suggests that the reason for the discrepancy 
is to be found in the effect of the damped wings. 



4.6 The non-linear effect of the damped wings 



The damped wings imply that the linear theory approximation is not actually valid even 
on very large scales. The damped absorption falls as the inverse square of the line-of-sight 
separation, a dependence that is comparable to the rate of decrease of the correlation function. 
Therefore, the actual correlation function that is observed is not the linear theory expression 
derived from the power spectra in equation (4.15), but is the convolution along the line of 
sight of this linear correlation function with the Voigt profile of the average column density 
in HCDs that is correlated with the Lya forest absorption. 

Quantitatively, the linear form of the cross-correlation £,aH( x ±,v), where the parallel 
component of the separation v is assumed to be expressed as a velocity, assumes that the 
absorption of a HCD is localized into an interval much narrower than the separation v. The 
damped wings imply that this is not actually true, and that the absorption has the Voigt 
profile V(N, v' — v) at a velocity v ', for an absorber located at v with a column density N. 
Then, if the probability distribution of N is proportional to f(N), and neglecting any effects 
of blending of the HCD profiles among themselves, the cross-correlation of the Lya and HCD 
absorption is modified to a function ^ H equal to 

t y . s _ I dv '/ C Ss LL dNf(N)V(N,v' - tQkgCxj,,!/) 

" X± ' V) ~ f™ L dNf(N)W(N)/X a ' (425) 

where W(N) is the rest-frame equivalent width of an absorber of column density N, and 
Nn = 10 17 ' 2 cm -2 . The linear form for the cross-correlation in redshift space is readily 
obtained from the equations in [16], replacing (3 for (j3 a + /3#)/2 and /3 2 for fiaPH, and the 
convolution in the above equation can be computed and used in the model to fit the observed 
correlation function, by approximating £p ~ £2 and replacing by ^ n equation (4.8). 
Although we have not carried out this procedure for our mocks in this paper, our analysis in 
this section and the results in Figure 6 indicate that if this improved model is directly fitted 
to the data, the effects of HCDs ought to be corrected for to a high accuracy. 

A similar approach was used in the Appendix B of [17]. 
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5 Effect of Metal Lines 



Apart from HCDs, the other contaminating absorption that is found in the spectral range 
where the Lya forest is observed is from absorption lines of ionized heavy elements, or 
metals. When the transmission fraction is measured at a certain pixel, part of the absorption 
may be caused by any of the numerous intergalactic metal lines that are present in the 
ultraviolet spectrum. For metal lines with wavelengths close to that of the hydrogen Lya 
line, their absorption can overlap with nearby, and therefore correlated, Lya absorption. This 
introduces new components to the total transmission correlation, which will appear as peaks 
in the three-dimensional correlation centered on the line of sight at a velocity separation 
corresponding to the rest-frame wavelength separation of the metal and Lya line. In the 
same way, pairs of metal lines overlapping the Lya forest will also introduce peaks in the 
correlation function at the wavelength separation of each pair of metal lines. 

Although we shall not treat the impact of metals in detail in this paper, we introduce 
here the general formalism for metal line corrections to the overall transmission correlation 
that is analogous to the one used for HCDs, which we believe will be useful for treating 
their effect. Let the transmitted fraction field of each one of the metal lines, i, that may 
appear in the spectral region of the Lya forest, be = + <5j). In this section, the 
Lya transmission is not separated into a Lya forest and HCD part, for simplicity. The total 
transmission fraction field that is measured is 

F = F a ]jF i ; (5.1) 

i 

F(l + 8 F ) = F a Y[Fi (l + 6 a )(l + 6i) . (5.2) 

i 

We now define the following two-point cross-correlations: 

) = {S a (r)6i(r + x)) + <<5 Q (r)<5;(r - x)> ; (5.3) 

£y(x) = (<5i(r)^-(r + x)) + <^(r)^(r - x)) . (5.4) 

As usual, brackets denote ensemble averages over all pixel pairs separated by x. Let Aj be 
the central wavelength of each metal line i, and let the vector Xj be directed along the line 
of sight with its radial component equal to Xi = cH~ l (\i — X a )/\ a . The symmetrized cross- 
correlation £ Q j then has two peaks, at x = ±Xj, corresponding to the possibilities of having 
the Lya line near r and the metal line near r + x, or the Lya line near r — x and the metal 
line near r [and two peaks at ±(xj — Xj) for In the linear regime, these cross-correlations 
are simply the Fourier transforms of analogous power spectra to those in equations (4.13) 
and (4.14), which can be modeled in terms of bias factors and redshift distortion factors for 
each metal line, bi and defined as in equations (4.10) and (4.11), and related to the bias 
factors of their host halos and their mean effective optical depth in full analogy to equations 
(4.19) for the HCDs. Therefore, the two-point correlations £ Q j and £jj are fully specified by 
these factors as long as the linear regime approximation is valid. 

We define also: 

C m = F/(F a l[F i ) = l + J2U(0)+^2Cij(0) + - , (5-5) 

i i ij 
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where we have omitted the three-point and higher-point functions. Note that C m is nearly 
equal to unity, because the displacements Xi are usually large (the largest contributions to 
C m should arise from lines with a wavelength close to that of Lya, e.g., from Silll), so the 
correlations at zero separation are very small. 

The observed absorption perturbation field Sf is then 

C m S F = (1 + S a ) + 6i) - C m . (5.6) 

i 

Evaluating the correlation of 6p, and keeping only the most important terms, which are the 
two-point functions and the three-point term that contains only one metal contribution, we 
have 

c^6Kx) = - (c m - 1) 2 + Ux) + £»( x ) + E &( x ) + E ^-( x )+ 

i i i<j 

+ ^(5 Q (r)«5 Q (r±x)«5 i (r)) (5.7) 

i 

where the symbol ± implies here that there are two terms in the sum, one with the + sign 
and one with the — sign. 

Ignoring the last three-point term, which is probably very small (and may be computed 
numerically in simple models where metal lines are added in mocks with a similar prescription 
as the one we have used for HCDs, and can probably also be modeled as a product of the 
two correlations £ Q j and £ a ), this shows that the effect of every metal line is fully specified 
by the parameters bi and Pi, which can be related to the physical bias parameters of the host 
halos through the mean effective optical depth of every line, f e j. All these parameters should 
be directly measurable from the data, by extracting the shape of the peaks of the overall 
transmission correlation near the peak positions Xj. Obviously these measurements will not 
be possible for weak metal lines as the peaks they create become buried into the noise, but 
it should still be possible to obtain combined measurements of the parameters for a set of 
metal lines that are assumed to be hosted by the same halos (and therefore have the same 
values of b\ and /3j as in equations 4.19). This leads to a proposed outline for a program to be 
carried out to investigate all of the metal lines that can be statistically detected: to measure 
their parameters f e i, bi and fii and to use the formalism presented here to correct for their 
impact on the overall transmission correlation function. 

6 Conclusions 

The measurement of the bias and redshift distortion parameters of the Lya forest correlation 
function may reveal essential characteristics of the physical evolution of the intergalactic 
medium. However, their values are affected by the presence of the absorption profiles from 
high column density systems (HCDs) and metal lines in the observed spectra. It is therefore 
necessary to study how these systems affect the correlation function to try to correct the 
measured parameters for their effect. 

We have presented a numerical method to simulate the effect of HCDs (HCD) on the 
measured correlation function of the Lya absorption, in which the Voigt profiles of absorbers 
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are inserted in mock spectra of the Lya forest in positions that are correlated with the Lya 
optical depth. We have evaluated the increase of the noise in the correlation measurement 
and the systematic change that is introduced in the recovered bias parameters owing to 
the correlation of HCDs with the Lya forest. Both effects are very substantial: the HCDs 
contribution to the noise is close to that of the Lya forest itself, and the bias parameters are 
altered by ~ 10% in the models we have used. Even though most of the increase in the noise 
is caused by damped Lya systems that can individually be identified in the spectra, and 
therefore removed to reduce the noise, this should be considered as a dangerous operation 
to do because the identified systems that can be detected may have different bias factors, 
and therefore cause different systematic effects, than the set of all HCDs. If the removal of 
detected HCDs is attempted (either by masking regions of the spectrum where HCDs are 
present or fitting their profiles as part of the continuum), one should examine the difference 
in the results of the Lya correlation measurements when no HCDs are removed, and test the 
selection of HCDs and their effects through simulations using mock spectra in which HCDs 
are inserted with the observed correlation with the Lya forest. 

An analytical formalism has also been developed to more generally predict the changes 
induced by HCDs on the inferred linear bias parameters. We find that the most important 
terms biasing the correlation function can be computed from the two-point cross-correlations 
of the Lya and HCD absorption perturbation fields. Assuming the validity of linear theory 
and that HCDs are associated with their host halos in a way that is independent of the 
large-scale environment, we have inferred a set of simple equations relating the corrections 
on the bias factors to the effective optical depth and the host halos bias factor of the HCDs. 
The density bias factor is increased in absolute value by the product of the mean effective 
optical depth and the bias factor of the host halos of the HCDs (eq. 4.20), and the redshift 
distortion parameter is altered also in proportion to f e n and the difference j3 a — flu (eq. 4.22) 

Even though the results that can be derived analytically go in the direction of the 
correction to the fitted density bias factor that we find numerically for the specific model 
of the mocks we have analyzed, they do not quantitatively agree. We have identified two 
reasons for this discrepancy: one is the neglect of the three-point and four-point terms in the 
analytical approach, in particular the term ^ a in equations (4.5) and (4.6). The other one is 
the deviation from the linear theory form of the cross-correlation of the Lya forest and HCD 
absorption due to the extended nature of the damped absorption profiles. The latter effect, in 
particular, should mostly explain the change in the total flux redshift distortion parameter, 
/3f, in our mocks (see Table 1), despite the fact that by construction, the HCDs are inserted 
with a distribution obeying f3jj = {3 a . The damped wing profiles are acting analogously to 
"fingers of God" in galaxy redshift surveys to distort the contours of the correlation function 
and reduce the fitted value of /3f- 

The results of this paper lead us to believe that it is possible to accurately correct 
for the contamination introduced by HCDs in the total transmission correlation function 
£f, to infer the true underlying Lya correlation function £ Q , and therefore to constrain our 
models for the evolution of the intergalactic medium. In principle, by simply convolving the 
linear theory form of the cross-correlation function with a Voigt profile in the radial 
direction that results from the mean column density in HCDs associated with a Lya forest 
transmission perturbation S a , one should have a much more accurate model to be fitted to the 
observations. The effect of the next largest term, can be numerically taken into account 
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once the parameters for the HCD model (mainly bu, Ph and T e u) have been calibrated 
through the observational determination of the Lya forest - HCD cross-correlation. 

Although we have not attempted an evaluation of the impact of metal lines on the corre- 
lation function in this paper, we have proposed a basic formalism to understand and correct 
for their effect that is similar to the one for HCDs. For the metal lines that contaminate the 
spectral region of the Lya forest, the metal-Lya cross-correlations can also be measured from 
the data and its effect included when the observations of the overall flux correlation £p are 
fitted. We are therefore optimistic on the prospects for obtaining accurate measurements of 
the Lya correlation as a probe to the large-scale primordial perturbations and the physical 
evolution of the intergalactic medium from the BOSS and other future spectroscopic surveys 
of the Lya forest. 
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A Appendix: Clustering of the HCD systems 

In this Appendix we show that the method described in Section 2 distributes the HCDs 
following a correlation function £h( r ) m redshift space that is, on large scales, proportional to 
the flux correlation function £_F( r )- This implies that the redshift space distortion parameters 
of the two correlations are equal, f3h = [3 a - Furthermore, the ratio of these two correlations 
yields the relation between the bias factors of the distribution of HCDs and of the transmitted 
flux fraction field, bh and bp, 



To simplify our notation in this Appendix, we use F, F and £p to refer to the Lya 
forest variables that in the main text are referred to as F a , F a and £ a . 

The method to generate the mock Lya absorption field, described in detail in [8] and 
summarized in Section 2, generates first a random Gaussian field 5 g with a correlation function 
£ g (r), such that the final flux field F(5 g ) has the desired correlation £,f( t ), as well as the 
desired probability distribution function pp(F). We first prove that this auxiliary Gaussian 
field has the same correlation function as the flux field, with a different bias parameter b g . 
Then we show that the peaks of the Gaussian field also have the same correlation function, 
with a relative bias set by the peak threshold. 




(A.l) 
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A.l Biases of the Gaussian field 



The relation between the correlation of any function F{5 g ) and £ 5 can be computed as follows: 



^ 2 [l + ^(ri 2 )] = <F(xi)F(x 2 ) 



dF 1 / dF 2 p F (F 1 ,F 2 )F 1 F 2 
Jo 

0g(Sgl,Sg 2 ) F(5gl) F(5g 2 ) 

S 2 g i + S 2 g2 -26 gl 6 g2 ^ g (r 12 ) 



cxp 



dS, 



dS 



2 (l-g(r 



12; 



27r,/l-ari 2 ) 



F(S gl ) F(5 g2 ) 



(A.2) 

In the method to generate the Lya mocks, this expression is inverted to find £ 9 as a function 
of the desired £p. In this Appendix we use it to show that, on large scales, the two correlation 
functions are proportional to each other. 

The Gaussian variables 8 g \ = S g (xi) and 5 g2 = <5 9 (x 2 ) are normalized to unit disper- 
sion and their correlation is £3(1*12). We define the new normal variables y±, y 2 as linear 
combinations that are independent: 



S g i = y\ ; S g2 = £ ff y 1 + J 1 - £f y 2 . 



(A.3) 



In the linear regime, we can assume £ g <C 1 and use a first-order expansion of the function 
F{5 g2 ) : 



dF 



F(5 g2 ) « F(y 2 ) + — yi ti g = F(y 2 ) ( 1 



dr 



dS 



ff2 



yi 6 



9 \ ' 



(A.4) 



where r = — ln(F). 

The flux correlation in equation (A.2) now becomes, 

F 2 [l + £ F (r 12 )] ? 



1 - ^ Vl €g 



(A.5) 



£f(i"i 2 ) = (t>F/b g ) 2 S, g (ri 2 ), we find that the ratio of bias parameters is fully determined 



where p g (y) is the one-dimensional, normal Gaussian distribution. Requiring now that 
6K r i2) = (b F /b g ) 2 S, g (r 12 ), ** 
by the transformation F(S g ): 

'b F ^ 2 



■y roo roc 

-m \ dy 1 p g (y 1 )F(y 1 )y 1 / dy 2 p g (y 2 ) F(y 2 ) 
" J— 00 J— 00 



dr 
dyi 



(A.6) 



The mocks used in this paper are computed using a lognormal transformation for the 
optical depth r, 

F(5 g ) = e W [-T(d g )]=e W [-ae^] , (A.7) 

with a = 0.077 and 7 = 2.16. Using this transformation, the ratio of the bias factors is 

2 



(A. 
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A. 2 Biases of the peaks 



In section 2 we describe a method to distribute HCD systems in regions of the Lya spectra 
where the optical depth is above a threshold r c or, equivalently, above a threshold 8 gc in the 
Gaussian variable used to generate the optical depth. We refer to these regions as peaks. The 
threshold sets the fraction v of pixels that are candidates to host a HCD: 



dS gPg (Sg) 



(A.9) 



The correlation function of these peaks, is related to the probability of having a 
peak both at 8 g \ = 8 g (xi) and at 5 g 2 = <5 g (x2): 

P(5gl > 5g C , 5g 2 > 8 g C ) = [l + ^(l^)] 



dSgl / d8g2P g {8gl,8g2) 
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(A.IO) 



We now express S g ± and <5 ff 2 as a function of the independent normal variables y\ and 
U2 defined in equation A. 3. Defining 8' gc (yi,£ g ) = ($gc — ~~ ■f^) 1//2 ^ we obtain 
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(A.11) 



where b g and 6^ are the bias parameters of the Gaussian field and the peaks, respectively. 
The bias ratio is then a function of v only: 



b h 



Pg(8 



gc) 



dyip g (yi)yi ■ 



(A.12) 



For the values used in the models in this paper, we find bh/b g = 2.66 for v = 0.01, and 
bh/bg = 3.18 for u = 0.002. Using equation (A. 8) and the value of the Lya forest bias in our 
mocks, 6f = —0.1375, we find the bias of the HCD systems to be bh = 1.21 for u = 0.01, and 
b h = 1.43 for v = 0.002. 

Finally, because the correlation of the peaks has been shown to be proportional to the 
correlation of the Gaussian field, their redshift distortion parameter must be the same. 
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